Long Short-Term Memory for Japanese Word Segmentation

نویسندگان

  • Yoshiaki Kitagawa
  • Mamoru Komachi
چکیده

This study presents a Long Short-Term Memory (LSTM) neural network approach to Japanese word segmentation (JWS). Previous studies on Chinese word segmentation (CWS) succeeded in using recurrent neural networks such as LSTM and gated recurrent units (GRU). However, in contrast to Chinese, Japanese includes several character types, such as hiragana, katakana, and kanji, that produce orthographic variations and increase the difficulty of word segmentation. Additionally, it is important for JWS tasks to consider a global context, and yet traditional JWS approaches rely on local features. In order to address this problem, this study proposes employing an LSTMbased approach to JWS. The experimental results indicate that the proposed model achieves state-of-the-art accuracy with respect to various Japanese corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Long Short-Term Memory Neural Networks for Chinese Word Segmentation

Currently most of state-of-the-art methods for Chinese word segmentation are based on supervised learning, whose features aremostly extracted from a local context. Thesemethods cannot utilize the long distance information which is also crucial for word segmentation. In this paper, we propose a novel neural network model for Chinese word segmentation, which adopts the long short-term memory (LST...

متن کامل

DAG-based Long Short-Term Memory for Neural Word Segmentation

Neural word segmentation has attracted more and more research interests for its ability to alleviate the effort of feature engineering and utilize the external resource by the pre-trained character or word embeddings. In this paper, we propose a new neural model to incorporate the wordlevel information for Chinese word segmentation. Unlike the previous wordbased models, our model still adopts t...

متن کامل

Dependency-based Gated Recursive Neural Network for Chinese Word Segmentation

Recently, many neural network models have been applied to Chinese word segmentation. However, such models focus more on collecting local information while long distance dependencies are not well learned. To integrate local features with long distance dependencies, we propose a dependency-based gated recursive neural network. Local features are first collected by bi-directional long short term m...

متن کامل

Verbal-Auditory Skills in 5-year-Old Children of Semnan/Iran in 2006

Introduction: This research was planned to determine some verbal-auditory skills (verbal-auditory short memory and phonological awareness) that have the closest relationship with speech and language development in 5-year-old children. Method: In this descriptive cross-sectional study, 400 children of pre-school classes affiliated to Education and Welfare organizations in Semnan city were select...

متن کامل

Evaluating the Success of the Visual Learners in Vocabulary Learning through Word List versus Sentence Making Approaches.

Thisstudy sought to evaluate the learners'''' achievements with the visual learning style when exposed to the sentence making and word list approaches. On that account, 45 basic level participants who studied at the Iran Language Institute (ILI), Bushehr, took part in this research study. At the outset, the learners were given Barsch learning style inventory (1991) to determine the learners''''...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.08011  شماره 

صفحات  -

تاریخ انتشار 2017